Threshold optimization and random undersampling for imbalanced credit card data

نویسندگان

چکیده

Abstract Output thresholding is well-suited for addressing class imbalance, since the technique does not increase dataset size, run risk of discarding important instances, or modify an existing learner. Through use Credit Card Fraud Detection Dataset, this study proposes a threshold optimization approach that factors in constraint True Positive Rate (TPR) ≥ Negative (TNR). Our findings indicate Area Under Precision–Recall Curve (AUPRC) score associated with improvement threshold-based classification scores, while positive prior probability causes optimal thresholds to increase. In addition, we discovered best overall results selection are obtained without Random Undersampling (RUS). Furthermore, exception AUPRC, established default yields good performance scores at balanced ratio. evaluation four techniques, eight threshold-dependent metrics, and two threshold-agnostic metrics defines uniqueness research.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Credit Card Fraud Detection using Data mining and Statistical Methods

Due to today’s advancement in technology and businesses, fraud detection has become a critical component of financial transactions. Considering vast amounts of data in large datasets, it becomes more difficult to detect fraud transactions manually. In this research, we propose a combined method using both data mining and statistical tasks, utilizing feature selection, resampling and cost-...

متن کامل

Fast and Effective Credit Card Fraud Detection in Imbalanced Data using Parallel Hybrid PSO

Credit card fraud detection has been one of the major necessities of the current e-commerce based world. The ease of use provided by e-commerce transactions is hindered by the threat caused by fraudsters. Several models have been proposed for identifying fraudulent transaction in a credit card system. However, the threats still do tend to exist. This paper discusses and analyzes the major reaso...

متن کامل

Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling

In the classification framework there are problems in which the number of examples per class is not equitably distributed, formerly known as imbalanced data sets. This situation is a handicap when trying to identify the minority classes, as the learning algorithms are not usually adapted to such characteristics. An usual approach to deal with the problem of imbalanced data sets is the use of a ...

متن کامل

Combination of Ensemble Data Mining Methods for Detecting Credit Card Fraud Transactions

As we know, credit cards speed up and make life easier for all citizens and bank customers. They can use it anytime and anyplace according to their personal needs, instantly and quickly and without hassle, without worrying about carrying a lot of cash and more security than having liquidity. Together, these factors make credit cards one of the most popular forms of online banking. This has led ...

متن کامل

Evolutionary Undersampling for Classification with Imbalanced Datasets: Proposals and Taxonomy

Learning with imbalanced data is one of the recent challenges in machine learning. Various solutions have been proposed in order to find a treatment for this problem, such as modifying methods or the application of a preprocessing stage. Within the preprocessing focused on balancing data, two tendencies exist: reduce the set of examples (undersampling) or replicate minority class examples (over...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Big Data

سال: 2023

ISSN: ['2196-1115']

DOI: https://doi.org/10.1186/s40537-023-00738-z